I. Introduction

Towards Understanding the Gap Between Reported Traffic Conditions and Actual Traffic Injuries

For our final project, we wanted to examine how and whether 311 cases, and specifically mobility-related 311 cases, could be a useful predictor of a city’s high-injury network. We use Philadelphia as our test city, as there is already a comprehensive high-injury network (HIN) defined and released by the city that we were able to easily analyze. High-injury networks are generally defined as streets that pose higher risk for accidents and injuries to occur, across all modes of transportation. Some cities discern high-injury networks for each mode, such as varying HINs for pedestrian or cyclist injuries, but Philadelphia’s is meant to be comprehensive for all modes. According to the City of Philadelphia’s Vision Zero Action Plan for 2025, 80% of injuries occur on just 12% of city streets; Philadelphia’s high-injury network was formed around data on crash-related injuries and fatalities.

Our original idea for this project was to create a high-injury network for a city that doesn’t have one yet (Pittsburgh), and then compare it to qualitative (311) and demographic data to understand the value of citizen reporting and the injury burden on certain groups. However, we found that both defining and analyzing a HIN for a new city would not be achievable within the time frame, and moved to instead applying the project’s initial idea for 311-related and demographic analysis to Philadelphia’s already-established network.

We seek to build upon existing research on 311 reporting. Significant research has been done on 311 reporting in Boston, using a mixed methods approach that is equal parts social science and anthropology. O’Brien et. al. found that “higher attachment with the space was associated with a greater motivation to enforce local social norms and benefit the broader community.” [1] With this framework in place, the use of mobility-related 311 cases in comparison to actual traffic injuries allowed us to focus in on places not only where reporting is low, but where reporting does not reflect actual conditions. We also take a more socioeconomic approach - seeking to connect the fascinating behavioral dynamics studied by O’Brien et. al. and connect it to variables like commute mode, homeownership, and poverty.

We define mobility-related 311 cases as those that relate to street conditions.

df <- tibble(
  mobility = c("Abandoned Vehicle",
                              "Complaint (Streets)",
                              "Dangerous Sidewalk",
                              "Line Striping",
                              "Other (Streets)",
                              "Right of Way Unit",
                              "Right of Way",
                              "Salting",
                              "Stop Sign Repair",
                              "Street Defect",
                              "Street Paving",
                              "Street Trees",
                              "Traffic (Other)",
                              "Traffic Signal Emergency"))


kable(df,
      caption="311 Cases Classified as Mobility Cases")
311 Cases Classified as Mobility Cases
mobility
Abandoned Vehicle
Complaint (Streets)
Dangerous Sidewalk
Line Striping
Other (Streets)
Right of Way Unit
Right of Way
Salting
Stop Sign Repair
Street Defect
Street Paving
Street Trees
Traffic (Other)
Traffic Signal Emergency

In this study, we explore the correlations between demographic data, 311 cases, and the presence of high-injury networks, and develop predictive models for metrics around these networks.

II. Data Preparation and Exploratory Analysis

Data

To examine demographic data, we utilize 2019 American Community Survey 5-Year estimates for census tracts. 311 Data is also pulled from 2019.

#ONLY NEED TO RUN THIS BLOCK IF RE-DOING ACS DATA

get_acs_csv <- function(yr1, variableList, nameList) {
  
  acs1 <- get_acs(geography = "tract",
                   year = yr1,
                   variables = variableList,
                   geometry = TRUE,
                   state  = "PA",
                   county = "Philadelphia",
                   output = "wide"
  )
  
  acs_full <- acs1 %>% dplyr:: select(!ends_with("M")) #include this line to remove margins of error
  names(acs_full) <- c("GEOID","NAME", nameList,"geometry")
  #acs_full <- acs_full[order(acs_full$GEOID),]
  #write.csv(acs_full, file=paste(deparse(substitute(variableList)), yr1,"-",yr2, ".csv")) #include this line to export a csv
  return(acs_full)
}

#use this line to look at acs variables
#variables19_5 <- load_variables(year = 2019,dataset = "acs5")


variableList <- c("B01001_001", #total population
              "B01001_002", #total male
              "B01001_026", #total female
              "B02001_002", #white alone
              "B02001_003", #black alone
              "B02001_004", #AmIn/AlNative alone
              "B02001_005", #Asian alone
              "B02001_006", #Native hawaiian/PI alone
              "B02001_007", #some other race alone
              "B02001_008", #two plus races
              "B03003_003", #total Hispanic or Latino
              "B05009_001", #under 18 years B05009_001 for 2010, B09001_001 for 2019
              "B15001_001", #18 years + B15001_001 for 2010, B09021_001 for 2019
              "B11003_001", #families with children under 18
              "B08301_001", #workers
              "B08301_002", #means of transportation to work: car
              "B08301_010", #means of transportation to work: public transit
              "B08301_018", #means of transportation to work: bicycle
              "B08301_019", #means of transportation to work: walked
              "B08301_021", #means of transportation to work: wfh
              "B19013_001", #median income
              "B17001_002", #Income in the past 12 months below poverty level
              "B17001_031", #Income in the past 12 months at or above poverty level
              "B25064_001", #median gross rent
              "B25003_002", #owner occupied
              "B25003_003" #renter occupied
              )

nameList <- c("pop",
                "male",
                "fem",
                "white",
                "black",
                "native_am",
                "asian",
                "nh_pi",
                "other",
                "multi",
                "hisp_lat",
                "Under_18",
                "Above_18",
                "fam_child",
                "workers",
                "Car",
                "Transit",
                "Bicycle",
                "Walking",
                "WFH",
                "mdn_inc",
                "blw_pov",
                "abv_pov",
                "med_rent",
              "own",
              "rent")

acs_data <- get_acs_csv(2019,variableList, nameList)
# add percentages
acs_data <- acs_data %>% 
  mutate(PCT_m = male/pop,
         PCT_f = fem/pop,
         PCT_wht = white/pop,
         PCT_blk = black/pop,
         PCT_na = native_am/pop,
         PCT_asn = asian/pop,
         PCT_pi = nh_pi/pop,
         PCT_oth = other/pop,
         PCT_mlt = multi/pop,
         PCT_hsp = hisp_lat/pop,
         PCT_fam = Under_18/pop,
         PCT_car = Car/workers,
         PCT_trn = Transit/workers,
         PCT_bik = Bicycle/workers,
         PCT_wlk = Walking/workers,
         PCT_wfh = WFH/workers,
         PCT_pov = blw_pov/pop,
         PCT_own = own/pop,
         PCT_rent = rent/pop,
         maj_wht = if_else(PCT_wht > 0.50, 1, 0),
         maj_blk = if_else(PCT_blk > 0.50, 1, 0))

acs_data <- st_transform(acs_data,crs=4326)

#this line writes the data to a shapefile
st_write(acs_data,"./data/acs_bg.shp", driver="ESRI Shapefile", append=FALSE)

Spatial Structure

Because we need to combine 311 data, which is point data, ACS data, which is formatted in polygons, and HIN and street data, which is made up of lines (that cross many ACS polygons), we had to consider the best way to integrate both forms. We decided to evaluate HINs through street intersections; to transform the street network into polygons, we created polygons that center at and are equidistant between each intersection. After this, we joined 311 point data to these intersection polygons, then joined ACS data by transferring the data from whichever ACS polygon had the largest overlap to each intersection polygon.

There are a few issues with this structure; for one, the intersections polygons created vary greatly in size due to differing block lengths, especially near parks and the Schuylkill river. We attempted to alleviate this by clipping polygons based on the Schuylkill river. Second, ACS data will not be entirely exact, as there may be differing tracts that overlap each intersection polygon. Despite these concerns, this format is fairly accurate and proved to be easier to work with than other methods of data combination.

Figure 1. Intersection Polygons

#ONLY NEED TO RUN THIS BLOCK IF RE-DOING ACS DATA

#setwd("G:/My Drive/GrSchool/CPLN 505 Planning by Numbers/PBN_Final")

all311 <- read_sf("./data/311_2019_allSelected.shp")

m311 <- all311 %>% filter(service_na %in%
                            c("Abandoned Vehicle",
                              "Complaint (Streets)",
                              "Dangerous Sidewalk",
                              "Line Striping",
                              "Other (Streets)",
                              "Right of Way Unit",
                              "Right of Way",
                              "Salting",
                              "Stop Sign Repair",
                              "Street Defect",
                              "Street Paving",
                              "Street Trees",
                              "Traffic (Other)",
                              "Traffic Signal Emergency"))

st_write(m311,"./data/m311.shp", driver="Esri Shapefile", append=FALSE)

hin <- read_sf("./data/high_injury_network_2020.shp")
streets <- read_sf("./data/CompleteStreets.shp")
ip <- read_sf("./data/intersection_polygons_clip.shp")

crash <- read_sf("./data/COLLISION_CRASH_2016_2020.shp")
crash_fatal <- crash %>% filter(FATAL_COUN > 0)
crash_injury <- crash %>% filter(INJURY_COU > 0)

ip2 <- ip %>% 
  mutate(all311 = lengths(st_intersects(.,all311)),
         m311 = lengths(st_intersects(.,m311)),
         hin_c=lengths(st_intersects(.,hin)),
         hin = if_else(hin_c > 0, 1,0),
         crsh_c=lengths(st_intersects(.,crash)),
         fatal =if_else(lengths(st_intersects(.,crash_fatal)) > 0, 1, 0),
         fatal_c =lengths(st_intersects(.,crash_fatal)),
         inj = if_else(lengths(st_intersects(.,crash_injury)) > 0, 1,0),
         inj_c = lengths(st_intersects(.,crash_injury))
         )
st_write(ip2,"./data/ip.shp", driver="ESRI Shapefile", append=FALSE)

#ip <- st_join(ip,streets,join=st_intersects)
#this line joins polygons to acs data, but takes too long so just do this via spatial join in GIS
#ip <- st_join(ip,acs_data,join=st_intersects, largest=TRUE)

#write.csv(ip_join, file="./data/ip_join.csv")
dat <- read_sf("./data/ip_acs_join.shp")

dat <- select(dat, -c(TARGET_FID, Id, Input_FID, OBJECTID, NODE_ID))

dat <- dat %>% 
  mutate(lm311 = if_else(m311 < 4, 1, 0),
         logm311 = log(m311),
         gap = if_else(m311<4 & hin==1, 1,0)
         )

dat_bg <- read_sf("./data/ip_acs_join_bg.shp")

dat_bg <- select(dat_bg, -c(TARGET_FID, Id, Input_FID, OBJECTID, NODE_ID))

dat_bg <- dat_bg %>% 
  mutate(lm311 = if_else(m311 < 4, 1, 0),
         logm311 = log(m311),
         gap = if_else(m311<4 & hin==1, 1,0)
         )

#only need these if mapping
# hin <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/hin.geojson")
# streets <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/CompleteStreets.geojson")
# ip <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/intersection_polygons_clip.geojson")
# m311 <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/m311.geojson")

# ggplot() +
#   geom_point(data = m311, aes(x = lon, y = lat), alpha = .05)
# 
# ggplot()+
#   geom_sf(data=dat, show.legend = NA, color = palette5[5], size=0.25)

Exploratory Analysis

Summary Tables of Continuous Variables

We use summary statistics to get a basic idea of three varying cases: intersections with high mobility cases that are in the HIN, intersections with low mobility cases in the HIN, and intersections with high mobility cases that are not in the HIN. Even with these tables, it is clear that there is some variation; Table 4 illustrates averages for a few variables of interest.

dat1 <- dat %>% st_drop_geometry() %>% filter(m311>3 & hin==1) #high mobility cases and in the hin
dat2 <- dat %>% st_drop_geometry() %>% filter(m311<=3 & hin==1) #low mobility cases and in the hin
dat3 <- dat %>% st_drop_geometry() %>% filter(m311>3 & hin==0) #high mobility cases and not in the hin

dat1 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 1. High Mobility Cases in the HIN") %>%
  kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
Table 1. High Mobility Cases in the HIN
Join_Count all311 m311 hin_c hin crsh_c fatal fatal_c inj inj_c pop male fem white black native_am asian nh_pi other multi hisp_lat Under_18 Above_18 fam_child workers Car Transit Bicycle Walking WFH mdn_inc blw_pov abv_pov med_rent own rent PCT_m PCT_f PCT_wht PCT_blk PCT_na PCT_asn PCT_pi PCT_oth PCT_mlt PCT_hsp PCT_fam PCT_car PCT_trn PCT_bik PCT_wlk PCT_wfh PCT_pov PCT_own PCT_rent maj_wht maj_blk lm311 logm311 gap
1.657815 9.101669 7.756449 1.189681 1 8.08953 0.1062215 0.1236722 0.8710167 6.031866 4523.993 2122.212 2401.781 1448.404 2167.28 21.6305 309.5083 2.830046 437.8998 136.4408 903.3528 991.6055 3453.4 906.2693 1822.61 1030.714 498.3103 31.57132 150.758 73.12747 41563.42 1270.546 3117.763 1029.032 841.9256 810.7762 0.4600813 0.5179157 0.3126062 0.482271 0.0045639 0.0656056 0.0007335 0.0825589 0.0296578 0.1749373 0.2008989 0.5344043 0.2817445 0.0162121 0.0848734 0.0401989 0.2702027 0.1786143 0.1866427 0.2488619 0.4658574 0 1.911576 0
dat2 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 2. Low Mobility Cases in the HIN") %>%
  kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
Table 2. Low Mobility Cases in the HIN
Join_Count all311 m311 hin_c hin crsh_c fatal fatal_c inj inj_c pop male fem white black native_am asian nh_pi other multi hisp_lat Under_18 Above_18 fam_child workers Car Transit Bicycle Walking WFH mdn_inc blw_pov abv_pov med_rent own rent PCT_m PCT_f PCT_wht PCT_blk PCT_na PCT_asn PCT_pi PCT_oth PCT_mlt PCT_hsp PCT_fam PCT_car PCT_trn PCT_bik PCT_wlk PCT_wfh PCT_pov PCT_own PCT_rent maj_wht maj_blk lm311 logm311 gap
1.627287 1.929196 1.18576 1.118536 1 5.61257 0.0799523 0.089101 0.7983294 4.129674 4403.953 2076.395 2327.558 1530.323 1958.685 16.83691 325.3341 2.604216 433.2331 136.9375 880.1348 945.8345 3383.691 880.1714 1783.377 1041.418 455.3874 25.32379 154.3926 71.55847 42014.55 1222.102 3029.864 1019.444 798.2693 807.4614 0.4626325 0.5158878 0.3401637 0.4481873 0.0038957 0.0704639 0.0006076 0.084716 0.030486 0.1773042 0.1969008 0.5463029 0.2688997 0.0141351 0.0889295 0.0402165 0.2704805 0.1732773 0.1918049 0.2895784 0.4089101 1 -Inf 1
dat3 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 3. High Mobility Cases NOT in the HIN") %>%
  kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
Table 3. High Mobility Cases NOT in the HIN
Join_Count all311 m311 hin_c hin crsh_c fatal fatal_c inj inj_c pop male fem white black native_am asian nh_pi other multi hisp_lat Under_18 Above_18 fam_child workers Car Transit Bicycle Walking WFH mdn_inc blw_pov abv_pov med_rent own rent PCT_m PCT_f PCT_wht PCT_blk PCT_na PCT_asn PCT_pi PCT_oth PCT_mlt PCT_hsp PCT_fam PCT_car PCT_trn PCT_bik PCT_wlk PCT_wfh PCT_pov PCT_own PCT_rent maj_wht maj_blk lm311 logm311 gap
1.365113 9.036145 7.46307 0 0 2.041121 0.0065479 0.0065479 0.5301205 1.454165 4750.637 2259.248 2491.389 1863.155 2092.983 17.31456 302.2412 2.163436 321.3965 151.3837 733.5254 1021.575 3651.174 991.3075 2060.079 1186.894 548.9002 52.31823 149.572 86.87428 48607.51 1148.94 3525.859 1055.483 990.3332 792.4419 0.4729311 0.5220925 0.3906827 0.4512141 0.0035128 0.0589335 0.0005067 0.0586803 0.0314934 0.1377884 0.2037315 0.5616817 0.2800231 0.0227735 0.0706095 0.0422709 0.2371694 0.2068883 0.1759053 0.3863279 0.4313777 0 1.88581 0
# HIN_dat <- filter(dat, hin == 1)
# mcase_dat <- filter(dat, m311 > 3)
# HIN_d <- density(HIN_dat$PCT_own)
# mcase_d <- density(mcase_dat$PCT_own)
# acs_d <- density(dat$PCT_own, na.rm=TRUE)
# plot(HIN_d, ylim=c(0,7))
# plot(mcase_d, ylim=c(0,7))
# plot(acs_d, ylim=c(0,7))

Table 4.

III. Correlation Tests

numericVars <-
  select_if(dat %>% st_drop_geometry(), is.numeric) %>%
  select(Join_Count, all311, m311, hin, hin_c, crsh_c, fatal, fatal_c, inj, inj_c, pop, starts_with("PCT")) %>%
  na.omit()
ggcorrplot(  #correlation plot
          round(cor(numericVars), 1),
          p.mat = cor_pmat(numericVars),
          show.diag = TRUE,
          lab = TRUE,
          colors = c("#08519c", "white", "#FA342A"),
          type="lower",
          insig = "blank",
          ) +
          labs(title = "Figure 2. Correlations of Demographic Data, HIN Data, and 311 Data",
               subtitle = "Intersection Polygons, Philadelphia, PA") 

To initially examine correlation, we utilize a correlation plot for all numeric variables. As we can see from the correlation plot, there are many demographic variables that exhibit correlation between each other, but when looking at correlation to the presence of the HIN, there are no strongly correlated variables outside of crash-related variables.

Graphic Comparisons

Figure 3. Income

Figure 3 compares the relative densities of high-injury intersections and intersections with a high number of mobility cases with median income (by census tract). The majority of high-injury intersections with a high number of mobility cases are at this peak around $30,000 median income, and after a breaking point of about $50,000, 311 cases become more frequently reported as incomes rise.

Figure 4. Percentage White

Figure 4, which changes the x-axis to percentage of white residents, illustrates two points: one, High-injury intersections are much more concentrated in less-white census tracts. Two, we see that as the percentage of white residents per tract rises, reporting of (mobility-related) 311 cases become relatively more concentrated.

Figure 5. Homeownership

Figure 5 utilizes home ownership as a comparison. High-injury intersections do peak at slightly lower percentage rates of homeownership; additionally, similar to the past two comparisons, reporting of mobility cases becomes more dense than High-Injury Intersections as the rate of homeownership rises.

IV. Binary Logit Modeling

To pick variables from the dataset that may be useful for our models, we use backward stepwise regression.

#backward stepwise regression
hin_test <- glm(hin ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam + all311 + m311,
            data = dat %>% st_drop_geometry(),
            family="binomial" (link="logit"))

lm311_test <- glm(lm311 ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam,
            data = dat %>% st_drop_geometry(),
            family="binomial" (link="logit"))

# check this for na values
gap_test <- glm(lm311 ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam,
            data = dat %>% st_drop_geometry() %>% filter(hin==1),
            family="binomial" (link="logit"))

drop1(hin_test, test="Chisq")
## Single term deletions
## 
## Model:
## hin ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + 
##     maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + 
##     PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + 
##     PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam + all311 + 
##     m311
##          Df Deviance   AIC     LRT  Pr(>Chi)    
## <none>         16860 16912                      
## pop       1    16862 16912   1.738  0.187355    
## mdn_inc   1    16866 16916   5.968  0.014566 *  
## med_rent  1    16892 16942  32.388 1.263e-08 ***
## workers   1    16860 16910   0.144  0.704068    
## PCT_f     1    16865 16915   5.083  0.024168 *  
## PCT_wht   1    16860 16910   0.050  0.822303    
## maj_wht   1    16860 16910   0.408  0.522956    
## PCT_blk   1    16862 16912   1.741  0.186989    
## maj_blk   1    16860 16910   0.087  0.768116    
## PCT_asn   1    16866 16916   6.309  0.012012 *  
## PCT_na    1    16860 16910   0.481  0.487833    
## PCT_pi    1    16860 16910   0.487  0.485421    
## PCT_oth   1    16877 16927  16.961 3.815e-05 ***
## PCT_mlt   1    16860 16910   0.301  0.583329    
## PCT_hsp   1    16861 16911   1.365  0.242691    
## PCT_car   1    16863 16913   3.313  0.068716 .  
## PCT_trn   1    16863 16913   2.876  0.089918 .  
## PCT_bik   1    16891 16941  30.678 3.046e-08 ***
## PCT_wlk   1    16860 16910   0.272  0.602235    
## PCT_wfh   1    16867 16917   6.999  0.008157 ** 
## PCT_own   1    16965 17015 104.899 < 2.2e-16 ***
## PCT_pov   1    16861 16911   0.756  0.384612    
## PCT_fam   1    16878 16928  18.159 2.032e-05 ***
## all311    1    16913 16963  52.615 4.059e-13 ***
## m311      1    16937 16987  76.663 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#step(gap_test, direction="backward")
#step(hin_test, direction="backward")

anova(hin_test, test="Chisq")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: hin
## 
## Terms added sequentially (first to last)
## 
## 
##          Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                     16865      18076              
## pop       1    14.82     16864      18061 0.0001181 ***
## mdn_inc   1   338.36     16863      17723 < 2.2e-16 ***
## med_rent  1   154.50     16862      17568 < 2.2e-16 ***
## workers   1    14.05     16861      17554 0.0001785 ***
## PCT_f     1    17.37     16860      17537 3.076e-05 ***
## PCT_wht   1    63.15     16859      17474 1.915e-15 ***
## maj_wht   1    15.89     16858      17458 6.728e-05 ***
## PCT_blk   1   215.96     16857      17242 < 2.2e-16 ***
## maj_blk   1     4.85     16856      17237 0.0276612 *  
## PCT_asn   1     0.55     16855      17237 0.4587719    
## PCT_na    1     0.00     16854      17237 0.9602461    
## PCT_pi    1     0.01     16853      17237 0.9242424    
## PCT_oth   1    11.36     16852      17225 0.0007515 ***
## PCT_mlt   1    14.16     16851      17211 0.0001676 ***
## PCT_hsp   1     3.32     16850      17208 0.0682871 .  
## PCT_car   1    30.77     16849      17177 2.899e-08 ***
## PCT_trn   1    14.58     16848      17162 0.0001341 ***
## PCT_bik   1    63.87     16847      17098 1.327e-15 ***
## PCT_wlk   1    25.58     16846      17073 4.252e-07 ***
## PCT_wfh   1     3.08     16845      17070 0.0793406 .  
## PCT_own   1   104.29     16844      16966 < 2.2e-16 ***
## PCT_pov   1     4.00     16843      16962 0.0453982 *  
## PCT_fam   1    17.94     16842      16944 2.284e-05 ***
## all311    1     6.97     16841      16937 0.0082880 ** 
## m311      1    76.66     16840      16860 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on AIC values, it appears that mobility-related 311 cases do not improve the model particularly; however, we still include them to get an idea of how mobility-related 311 cases are related.

High Injury Networks

Our first models predict whether an intersection is part of the High-Injury Network.

#our honed model
hinMod1 <- glm(hin ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + m311,
               data=dat %>% st_drop_geometry(),
               family="binomial" (link="logit"))

#model based on backwards stepwise
hinMod2 <- glm(hin ~ mdn_inc + med_rent + PCT_f + PCT_blk + PCT_asn + 
    PCT_oth + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wfh + 
    PCT_own + PCT_fam + all311 + m311,
               data=dat %>% st_drop_geometry(),
               family="binomial" (link="logit"))

#model just using crash and injury/fatal data
hinMod3 <- glm(hin ~ crsh_c + inj_c + fatal_c,
               data=dat %>% st_drop_geometry(),
               family="binomial" (link="logit"))

hinModb <- glm(hin ~ crsh_c + inj_c + fatal_c,
               data=dat_bg %>% st_drop_geometry(),
               family="binomial" (link="logit"))

stargazer(hinMod1, hinMod2, hinMod3,
          type="html", title="Table 5. High-Injury Network Intersections",
          column.labels = c("Model 1", "Model 2", "Model 3"),
          single.row = TRUE, digits=2)
Table 5. High-Injury Network Intersections
Dependent variable:
hin
Model 1 Model 2 Model 3
(1) (2) (3)
mdn_inc -0.0000** (0.0000)
med_rent 0.0002*** (0.0001) 0.001*** (0.0001)
maj_blk 0.11*** (0.04)
PCT_f 0.90** (0.37)
PCT_blk 1.06*** (0.12)
PCT_asn 2.03*** (0.29)
PCT_oth 4.62*** (0.60)
PCT_hsp -0.62* (0.36)
PCT_car -0.43*** (0.13) -1.38*** (0.21)
PCT_trn -1.31*** (0.28)
PCT_bik -6.76*** (0.77) -6.64*** (0.81)
PCT_wfh -2.91*** (0.67)
PCT_own -6.22*** (0.32) -3.95*** (0.37)
PCT_fam -2.02*** (0.35)
all311 -0.09*** (0.01)
m311 0.03*** (0.004) 0.12*** (0.01)
crsh_c 0.15*** (0.02)
inj_c 0.12*** (0.03)
fatal_c 1.57*** (0.14)
Constant -0.05 (0.10) -0.37*** (0.14) -1.97*** (0.03)
Observations 16,866 16,866 16,866
Log Likelihood -8,643.19 -8,435.43 -7,487.66
Akaike Inf. Crit. 17,300.38 16,902.86 14,983.32
Note: p<0.1; p<0.05; p<0.01
data.frame(100*(exp(coef(hinMod1)))-1) %>% 
  kable()
X100….exp.coef.hinMod1……1
(Intercept) 94.5429368
med_rent 99.0235609
maj_blk 110.7776230
PCT_car 64.2446404
PCT_bik -0.8844418
PCT_own -0.8009256
m311 101.6840253

We develop three models that are shown in Table 4: Model 2 is the model using most variables suggested by backwards stepwise regression, and Model 3 uses just crash-related data. Model 1 is our most refined model using just demographic variables, both binary and continuous, and a dummy variable for mobility-related 311 cases. As shown by the AIC, Model 3 is the most useful model for predicting the HIN, which is clear due to the fact that the Philadelphia HIN is based around crash data. To contrast, Model 2 is concentrated around specific variables of high significance that we were interested in. For example, as Figure 6 illustrates, our model examines the relationship between the percentage of black residents and presence of the High Injury Network. Based on this model, high injury intersections are 110% more likely to be in majority black tracts.

Figure 6. HIN and Majority Black Census Tracts

Low Reporting of Mobility Cases

These models predict for intersections that have mobility case reporting of 3 or fewer.

# our honed model
l311Mod1 <- glm(lm311 ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + PCT_pov,
               data=dat %>% st_drop_geometry(),
               family="binomial" (link="logit"))


l311Modb <- glm(lm311 ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + PCT_pov,
               data=dat_bg %>% st_drop_geometry(),
               family="binomial" (link="logit"))
#model based on backwards stepwise
l311Mod2 <- glm(formula = lm311 ~ mdn_inc + med_rent + workers + PCT_f + 
    PCT_blk + maj_blk + PCT_hsp + PCT_trn + PCT_bik + PCT_wlk + 
    PCT_wfh + PCT_own + PCT_pov + PCT_fam, family = binomial(link = "logit"), 
    data = dat %>% st_drop_geometry())

stargazer(l311Mod1, l311Mod2,
          type="html", title="Table 6. Low Mobility Case-Reporting Intersections",
          column.labels = c("Model 1", "Model 2"),
          single.row = TRUE, digits=2)
Table 6. Low Mobility Case-Reporting Intersections
Dependent variable:
lm311
Model 1 Model 2
(1) (2)
mdn_inc 0.0000*** (0.0000)
med_rent -0.0002*** (0.0001) -0.0004*** (0.0001)
workers -0.0002*** (0.0000)
PCT_f 1.99*** (0.34)
PCT_blk -1.00*** (0.15)
maj_blk -0.23*** (0.04) 0.22** (0.10)
PCT_car 0.61*** (0.12)
PCT_hsp -0.86*** (0.14)
PCT_trn -1.54*** (0.19)
PCT_bik -2.30*** (0.57) -4.43*** (0.58)
PCT_wlk -0.47* (0.26)
PCT_wfh -1.24** (0.56)
PCT_own -0.90*** (0.30) -0.55 (0.34)
PCT_pov -0.84*** (0.15) 1.14*** (0.25)
PCT_fam -1.72*** (0.34)
Constant 1.21*** (0.12) 1.53*** (0.15)
Observations 16,866 16,866
Log Likelihood -10,269.11 -10,135.47
Akaike Inf. Crit. 20,552.21 20,300.93
Note: p<0.1; p<0.05; p<0.01
data.frame(100*(exp(coef(l311Mod1)))-1) %>% 
  kable()
X100….exp.coef.l311Mod1……1
(Intercept) 335.232596
med_rent 98.979345
maj_blk 78.360239
PCT_car 183.300971
PCT_bik 8.996052
PCT_own 39.615710
PCT_pov 42.171354

For low mobility-related cases, we include a refined model (Model 1) and a model that includes all variables suggested by backwards stepwise regression. Though the AIC value is higher for Model 1, all variables included are significant highly significant. For our refined model, we found that low mobility-related311 reporting is 78% MORE LIKELY in majority black tracts; both factors are visualized in Figure 7.

Figure 7. Mobility-Related Cases and Majority Black Census Tracts

V. Conclusion

Comparing models that we have developed based on demographic data to models based on injury variables, we cannot say that demographic data is an accurate predictor of the presence of High-Injury Network streets. However, our models, whether for High-Injury Network presence, low-mobility 311 cases, or gap intersections, do have a level of predictive power. Additionally, there are certainly interesting and clear patterns that become apparent when spatially and graphically examining the demographic distributions alongside the High-Injury Network.

Since Philadelphia’s High-Injury Network is mainly based on vehicle crash data, including injuries and fatalities, we are basing the accuracy of our models on the trustworthiness of the city’s data. The efficiacy of High-Injury Networks depends on their accuracy, but crash data is also often flawed; for instance, bicycle injuries and crashes are vastly underreported. This may signify a need to include wider variables outside of just injury data when developing HINs, or do further investigation to ensure that data is accurate. Furthermore, the fact that several variables, both binary and continuous, in our dataset are highly significant when predicting for the presence of High-Injury Networks indicates the importance of utilizing demographic data and 311 data, both in the development of HINs for other cities or when conducting analysis on where to prioritize transportation infrastructure investments.

Sources

City of Philadelphia. (2022a). OpenDataPhilly. OpenDataPhilly. https://www.opendataphilly.org/

City of Philadelphia. (2022b). Vision Zero Philadelphia. Vision Zero. http://visionzerophl.com/

Conkle, A., & West, C. (2008). Psychology on the Road. APS Observer, 21. https://www.psychologicalscience.org/observer/psychology-on-the-road

Coulson, N. E., & Li, H. (2013). Measuring the external benefits of homeownership. Journal of Urban Economics, 77, 57–67. https://doi.org/10.1016/j.jue.2013.03.005

O’Brien, D. T. (2018). The urban commons: How data and technology can rebuild our communities. Harvard University Press.

O’Brien, D. T., Gordon, E., & Baldwin, J. (2014). Caring about the community, counteracting disorder: 311 reports of public issues as expressions of territoriality. Journal of Environmental Psychology, 40, 320–330. https://doi.org/10.1016/j.jenvp.2014.08.003

Pennsylvania Department of Transportation. (2022). Pennsylvania Crash Information Tool. Pennsylvania Department of Transportation. https://crashinfo.penndot.gov/PCIT/welcome.html

U.S. Census Bureau. (2010-2019). American Community Survey 5-year Estimates.